8  Introduction to Multivariate Analysis

8.1 Introduction

Multiple linear regression, which was covered in the previous section, is an example of ‘multivariate analysis’.

Multivariate analysis is a branch of statistics that deals with the examination of more than two variables simultaneously. It’s like looking at a complex puzzle where each ‘piece’ is a different variable, and we’re trying to see how they all fit together.

This type of analysis is crucial when we want to understand relationships between multiple factors at once, rather than just looking at them in pairs (as in simple linear regression).

It’s the most common form of analysis you’ll encounter, and therefore we’re going to spend a considerable chunk of this module developing our understanding of some techniques that make up ’multivariate analysis.

8.2 Types of multivariate analysis

Over the next few weeks, we’re going to explore a number of different statistical techniques that fall under the heading of multivariate analysis. We’ll cover:

  • Factor analysis, which is used to uncover underlying factors or themes from a large set of variables. Imagine you have a long list of questions from a survey; factor analysis helps you find out if some of these questions are actually asking about similar things.

  • Cluster analysis, which groups a set of objects in such a way that objects in the same group (or cluster) are more similar to each other than to those in other groups. It’s like sorting a mixed bag of sweets into piles where each pile contains similar types of sweets.

  • Discriminant analysis, which is used when you have groups and want to know what variables make these groups different. It’s like finding out what separates apples from oranges in a fruit basket based on features like colour, size, and taste.

  • Canonical Correlation analysis, which looks at relationships between two sets of variables. It’s like having two different languages and trying to find common words or phrases that can be understood in both.

  • MANOVA (Multivariate Analysis of Variance) and MANCOVA (Multivariate Analysis of Covariance), which are like advanced versions of ANOVA and ANCOVA. They deal with multiple dependent variables at the same time, checking if group means on these variables are different while considering other variables (covariates) in MANCOVA.

  • Path Analysis and Structural Equation Modelling. Path analysis is used to understand the direct and indirect relationships between variables in a diagrammatic way. Structural Equation Modelling (SEM) is more complex and combines factor analysis and path analysis to understand the structure of relationships between multiple variables.

Important

If your memory of ANOVA is cloudy, please make sure you’ve revised ANOVA in the next week or so! See Section 91.1.

8.3 Assumptions

Like the other techniques we have covered already in this module, multivariate analysis relies on several assumptions:

  • Normality: The data should follow a normal distribution. There should not be an obvious positive or negative skew in the data.

  • Linearity: The relationship between variables should be linear. That is, it should be reasonable to assume that variables rise or fall at a consistent pace. See

  • Homoscedasticity: The variance among the groups should be similar. See Section 7.2.3.

  • Absence of Multicollinearity: The variables shouldn’t be too highly correlated with each other.

Note

Remember: Multicollinearity is like when you’re trying to figure out what influences your exam grades, and you look at things like how much you study, how much you sleep, and your attendance. But here’s the catch: these things are all connected. If you study a lot, you might sleep less, and if you’re always at university, you’re probably studying more.

In statistics, when we’re trying to understand how different things affect something else (like your grades), multicollinearity is when the things we’re looking at (like study time, sleep, attendance) are so intertwined that it’s hard to tell which one is actually making the difference. It’s like trying to listen to multiple people talking at once and trying to understand each person’s message clearly.

These assumptions are crucial for the validity of the analysis. If they’re not met, the results might not be reliable.

8.4 Challenges

Multivariate analysis can be quite challenging because of its:

  • Complexity: The methods are mathematically complex and require a good understanding of statistics.

  • Data Requirements: These methods often need large datasets to be effective.

  • Interpretation: Results can be difficult to interpret, especially in methods like SEM where multiple relationships are analysed simultaneously.

Despite these challenges, multivariate analysis is a powerful tool in research, offering insights into complex relationships that simpler methods can’t provide. As with any tool, its effectiveness depends on your skill and understanding.

8.5 Reading

For further reading, I’d recommend you take a look at the following book, which is available via the University library through the module reading list on myplace:

  • Härdle, W. (2015). Applied multivariate statistical analysis / [internet resource] (L. Simar, Ed.; Fourth edition..). Berlin : Springer.